Robust Multiagent Plan Generation and Execution with Decision-Theoretic Planners
نویسنده
چکیده
ion for higher-level tasks is delayed until the lower level problems have been solved. This allows a significant level of minimization of the state space at each level. By abstracting tasks, however, it is possible that rewards will have to be introduced into a local problem that aren’t present in the real problem. Although this appears as extra work, these quasi-rewards, however, can be seen as a way of adding additional information to the problem that guides the planner more efficiently. During execution, an agent determines what action to take given its current state in the belief space, a probability distribution over the actual states of the world. To select an action, the policy tree is traversed down a path from the root to an appropriate leaf. If a primitive action is selected, it is executed. On the other hand, if an abstract action is suggested as optimal, its policy is recursively queried. This approach, by examining policies down the tree during execution, removes the need to be able to determine if a subtask is completed, a problem that is often difficult when working in POMDP domains as knowledge of the current state is often unclear to the current agent. Two concepts of optimality are often associated with hierarchical solutions to Markov Decision Problems. One, hierarchical optimality, is defined as finding the policy with maximal reward from the set of policies that are consistent with the problem hierarchy. Recursive optimality is weaker. A policy is recursively optimal for a given subproblem if it returns the maximal reward over all policies available for the subproblem, assuming fixed policies for more-concrete tasks below the subproblem. The difference between the two types of optimality is basically whether or not lower level policies are solved with regard to their calling context. Recursively optimal polices solve each policy independently, and thus may represent a poorer solution. On the other hand, recursively optimal approaches often allow for greater abstraction of the state space. PolCA is recursively optimal for fully-observable MDP problems. For POMDPs, no guarantee of optimality is provided. 2.4 Continual Planning Within any type of agent planning architecture, agents need to be capable of updating plans to be able to keep up with changes in the environment, to handle failures in current plans, and to take advantage of new opportunities that arise in the system. This is often implemented by interleaving plan execution and generation and is referred to as continual planning. Continual planning is particulary important in systems that change rapidly as advanced planning is often obsolete by the time many steps in a pre-generated plan would be acted on. One approach seen in several implemented systems relies on planning hierarchically. In these systems, agents refine and elaborate down a tree of plans that are abstract near the root of the tree and concrete near the leaves. Using this approach, agents can prolong the actual execution of portions of a plan until they have a clearer idea of the state of the world. This approach also allows agents to communicate high-level plans to other agents without having to be specific about how they will implement these high-level plans. This provides other agents with a sense of what to expect in the future, but does not tie an agent to having to perform specific tasks. The use of this hierarchical approach also facilitates the performance of execution monitoring, the process of analyzing the current state of the world to verify it is coherent with the expected state arising from performing actions. Monitoring is generally simple but problems arise in deterrmining how often agents should perform the monitoring task. By waiting too long, agents may end up executing plans that are no longer useful, while monitoring after every action can become computationally
منابع مشابه
Search Control of Plan Generation in Decision-Theoretic Planners
This paper addresses the search control problem of selecting which plan to refine next for decision-theoretic planners, a choice point common to the decision theoretic planners created to date. Such planners can make use of a utility function to calculate bounds on the expected utility of an abstract plan. Three strategies for using these bounds to select the next plan to refine have been propo...
متن کاملThe MultiAgent Decision Process toolbox: software for decision-theoretic planning in multiagent systems
This paper introduces the MultiAgent Decision Process software toolbox, an open source C++ library for decision-theoretic planning under uncertainty in multiagent systems. It provides support for several multiagent models, such as POSGs, Dec-POMDPs and MMDPs. The toolbox aims to reduce development time for planning algorithms and to provide a benchmarking platform by providing a number of commo...
متن کاملUsing Loops in Decision-Theoretic Refinement Planners
Classical AI planners use loops over subgoals to move a stack of blocks by repeatedly moving the top block. Probabilistic planners and reactive systems repeatedly try to pick up a block to increase the probability of success in an uncertain environment. These planners terminate a loop only when the goal is achieved or when the probability of success has reached some threshold. The tradeoff betw...
متن کاملDecision-Theoretic Plan Failure Debugging and Repair
A number of strategies exist for the recovery from execution-time plan failures. One manner in which these strategies difFer is the degree of dependence on the reliability and availability of the planner’s knowledge. The best strategy, however, may be dependent on a number of considerations, including the type of plan failure, the criticality of the failure, the availability of resources, and t...
متن کاملAn Easy “Hard Problem” for Decision-Theoretic Planning
This paper presents a challenge problem for decision-theoretic planners. State-space planners reason globally, building a map of the parts of the world relevant to the planning problem, and then attempt to distill a plan out of the map. A planning problem is constructed that humans find trivial, but no state-space planner can solve. Existing POCL planners cannot solve the problem either, but fo...
متن کاملExploiting Domain Structure in Multiagent Decision-Theoretic Planning and Reasoning
EXPLOITING DOMAIN STRUCTURE IN MULTIAGENT DECISION-THEORETIC PLANNING AND REASONING MAY 2013 AKSHAT KUMAR B.Tech., INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI M.Sc., UNIVERSITY OF MASSACHUSETTS AMHERST Ph.D., UNIVERSITY OF MASSACHUSETTS AMHERST Directed by: Professor Shlomo Zilberstein This thesis focuses on decision-theoretic reasoning and planning problems that arise when a group of collaborative...
متن کامل